[state sync] migrate to new bucket format#25925
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
8aca1df to
bd8ab7a
Compare
bd8ab7a to
a61219e
Compare
a61219e to
b0770b1
Compare
b0770b1 to
85efc3b
Compare
amnn
left a comment
There was a problem hiding this comment.
@phoenix-o, could you motivate your use of the indexing framework here, as well as some of the changes you need to make to it?
- Why does state sync need those changes, and are those needs general to all users of the indexing framework or specific to state sync?
- How much of the indexing framework are you using?
It's possible/likely that the best option is to not use the indexing framework, but implement a small/custom interface over object_store instead that caters exactly to the node software's needs.
|
@amnn: State sync currently uses the old ingestion framework. This PR migrates it(last callsite) and utilizes the new checkpoint format. |
If only
Separately, I think this issue has come up before, so maybe it is worth solving in the framework, even if state sync doesn't use it:
|
|
@amnn: sounds good, I’ll revert the changes to |
85efc3b to
a75e891
Compare
amnn
left a comment
There was a problem hiding this comment.
Thanks @phoenix-o, this is now mostly outside of my domain, so will leave the rest of the review to @bmwill or @nickvikeras
crates/sui-network/Cargo.toml
Outdated
| sui-simulator.workspace = true | ||
| sui-protocol-config.workspace = true | ||
| sui-data-ingestion-core.workspace = true | ||
| sui-indexer-alt-framework.workspace = true |
There was a problem hiding this comment.
Should be able to get rid of this now, right?
There was a problem hiding this comment.
I am a little confused at the state of the PR. It looks like Ashok's preference is to move off of the indexing framework's ingestion client and use object_store directly, but we are still using the ingest client in this revision.
There was a problem hiding this comment.
Yes, I’m still using the client. Only reverted the changes to the client itself (i.e. I construct object store instance outside)
There was a problem hiding this comment.
At that point, why use the ingestion client? You are pulling in all the code for building indexers (and which will one day not even be in the mono-repo) but all you need is the naming scheme for fetching a checkpoint, right?
There was a problem hiding this comment.
the point is to minimize code duplication. We’ll need the same functionality in a couple of other places (i.e. sui-tool). Maybe it should live in some other common place. But ok, removing the sui-indexer-alt-framework dependency from this PR
This was for reading out of the parquet bucket, not the checkpoint bucket, but the code is here - sui/crates/sui-analytics-indexer/src/indexer.rs Lines 151 to 169 in 67116b9 I think we want to do something similar in the indexing framework so we can enable requester pays on our public checkpoint buckets. |
a75e891 to
ebd302f
Compare
| remote_store_options: Vec<(String, String)>, | ||
| ) -> Arc<dyn ObjectStore> { | ||
| let timeout_secs = 5; | ||
| let client_options = ClientOptions::new() |
There was a problem hiding this comment.
Do we need to add .with_default_headers() to set the requester pays headers?
There was a problem hiding this comment.
Don’t think so. The current sui-data-ingestion-core client doesn’t set it and works with requester-pays buckets. But once accepted, I’ll state-sync the mainnet FN with the new binary and verify that it works
There was a problem hiding this comment.
yeah, spun up FN locally. It syncs from the S3 requester-pays bucket
Description
migrates state sync to the new bucket format
Release notes
Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.
For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.